[00:09.240 --> 00:15.280]  Hi, so my name is Tim Yardley. I'm here to talk about some work that I've done under
[00:15.800 --> 00:23.210]  DARPA RADIX program. Whoa, I'm getting audio from somewhere. Hold on.
[00:35.120 --> 00:38.600]  All right, sorry about that. I was getting feedback from the live channel.
[00:39.420 --> 00:45.500]  So I am here to talk about building a cyber physical testbed and how we've used that to
[00:45.500 --> 00:52.120]  support black start restoration under cyber fire. So this has been part of a DARPA RADIX
[00:52.120 --> 00:56.540]  program. The RADIX program is Rapid Attack Detection Isolation and Characterization
[00:56.540 --> 00:59.760]  Systems. And I'll talk a little bit about what that is here in a minute.
[01:01.300 --> 01:09.120]  So just for sort of coverage and statement, the testbed work that I'm going to talk about
[01:09.120 --> 01:13.620]  was funded under the DARPA's Rapid Attack Detection Isolation and Characterization
[01:13.620 --> 01:18.120]  Systems program. I work for the University of Illinois. I'm the principal investigator
[01:18.940 --> 01:25.420]  of the University of Illinois' effort under there. We also had some government SME support,
[01:25.420 --> 01:32.960]  subject matter expert support from Idaho National Lab. And then the program evaluator
[01:32.960 --> 01:38.980]  is a company called Probatech. And all three of those organizations have been critical to
[01:39.420 --> 01:44.240]  the work that I'm talking about in terms of enabling the black star restoration efforts
[01:44.240 --> 01:50.120]  and the validation of the work itself. So let's talk a little bit, give you a little background
[01:50.120 --> 01:56.120]  on me. So my name is Tim Yardley. I'm a principal research scientist at the Information Trust
[01:56.120 --> 02:03.580]  Institute at the University of Illinois. And I am also a father, a husband, and broadly a
[02:03.580 --> 02:10.880]  researcher. I've been doing work in industrial control systems for about 14 years now. And I've
[02:10.880 --> 02:20.020]  been doing work in security for, geez, probably almost 30 years now across the board. My
[02:20.020 --> 02:29.660]  background came from sort of the think tank IRC eras back in the early days of FNET groups that
[02:29.660 --> 02:36.560]  that, let's say, like to explore and like to figure out things, but also cause some havoc
[02:36.560 --> 02:42.560]  in the process. I'm one of the original members of WooWoo and a variety of other security think
[02:42.560 --> 02:50.660]  tanks over the years. And I've sort of been in the realm of computer security for
[02:50.660 --> 02:56.780]  quite some time. I've been working in academia for about the past 12 years. And prior to that,
[02:56.780 --> 03:02.340]  I worked in industry supporting a variety of different efforts. So the DARPA RADIX effort
[03:02.340 --> 03:08.520]  itself, let's talk a little bit about what it is. So the program is designed to
[03:09.240 --> 03:18.100]  build technologies that fill a gap. And that gap really is the notion that if we are
[03:18.100 --> 03:24.900]  attacked by nation states, we are ill-prepared to be able to determine exactly what happened,
[03:24.900 --> 03:28.600]  where it happened, and how it happened, and get rid of it as fast as possible.
[03:28.980 --> 03:36.280]  So DARPA stood up this program to build technology that advanced that state of,
[03:36.280 --> 03:43.260]  let's say, preparedness, and then to evaluate that technology on a realistic facility. And so
[03:43.260 --> 03:49.640]  our role in that program was to build that realistic facility. We'll talk in more detail
[03:49.640 --> 03:56.460]  about the objectives of the program itself. So these are all public slides from DARPA
[03:56.960 --> 04:02.640]  that are here. But the key objective of the program itself is to enable black start recovery
[04:02.640 --> 04:07.380]  of the power grid amidst a cyber attack on the energy sector's critical infrastructure. In this
[04:07.380 --> 04:14.260]  particular case, the critical infrastructure is the electric power grid. So in a prevention
[04:15.400 --> 04:21.280]  a sort of state, you would say, okay, well, let's defend and detect what's going on.
[04:21.280 --> 04:27.180]  But this program starts from the adversary has been successful. So you have a complete blackout,
[04:27.180 --> 04:32.200]  no power anywhere, and you have to figure out exactly what happened, how it happened, etc.
[04:33.640 --> 04:40.860]  So the devices that are involved, you have to figure out what devices can be trusted,
[04:40.860 --> 04:47.820]  because you don't know what's compromised and what is safe or what is operating per norm.
[04:47.880 --> 04:55.760]  And many of the physical infrastructures are effectively controlled by intelligent devices.
[04:55.760 --> 05:01.320]  So these are the ICS devices that you have to explore on, let's say, the cyber side of the
[05:01.320 --> 05:09.300]  equation that are controlling the physical aspects of the grid. And these assets are spread throughout
[05:09.300 --> 05:15.460]  the country. So how can you do this in a scalable way? How can you deeply and forensically poke at
[05:15.460 --> 05:22.220]  these embedded devices? How can you figure out exactly what is trustable, what is not trustable,
[05:22.220 --> 05:26.240]  what was attacked, how it was attacked, and then get rid of it as fast as possible.
[05:26.240 --> 05:32.820]  The goal is to do so across the entire United States within seven days. And that's to isolate,
[05:32.820 --> 05:40.480]  characterize, and restore any crank paths necessary to bringing the grid for the United States back online.
[05:42.360 --> 05:49.600]  So let's talk about the manifestation of this. So in the first year or so, year and a half of the
[05:49.600 --> 05:55.020]  program, we built exercise environments that were run at the University of Illinois that were
[05:55.020 --> 06:01.160]  getting people's feet wet and getting the technology ready to explore, let's say, the beginning
[06:01.160 --> 06:08.200]  edges of the problem space. And as we evolved both the technology and the program,
[06:08.200 --> 06:16.140]  we had to turn the corner. So people will only believe what happens in a lab to the extent that
[06:16.140 --> 06:23.520]  it's, oh, that was in a lab environment. Oh, that's fine if you do it just on science. But that's not
[06:23.520 --> 06:30.340]  the real world. So we took it to a federal island, which is called Plum Island. It's the home of the
[06:30.340 --> 06:40.400]  Animal Disease Center, controlled and owned by DHS, S&T, and is a former siting of Fort Terry in
[06:40.400 --> 06:45.160]  the Spanish-American War and a bunch of other stuff. The size of the space that we're controlling
[06:45.160 --> 06:50.920]  there is roughly the size of Central Park. And we have built electrical infrastructure on that
[06:50.920 --> 06:57.400]  island. And so I'll talk about the most recent incarnation of it that was completed in November
[06:57.400 --> 07:03.920]  of 2019. That was the sixth exercise of the program. We have one more exercise still to do.
[07:03.920 --> 07:10.000]  COVID has postponed that a little bit, but still pending. So far has not been fully canceled.
[07:10.440 --> 07:17.660]  And we look to sort of do that last hurrah of the program and validate the advancements from
[07:17.660 --> 07:25.740]  the last exercise till now. So in the island infrastructure, we basically have built three
[07:25.740 --> 07:31.100]  full utilities. And those utilities are represented by gear that we'll talk about here
[07:31.100 --> 07:36.620]  in a moment. They have an emergency operations center for each utility. And then there's a
[07:36.620 --> 07:41.340]  regional coordinator that's trying to coordinate the actions amongst the different utilities as
[07:41.340 --> 07:48.480]  it goes through. We have the National Guard involved. We have the RADIX performers involved
[07:48.480 --> 07:54.980]  and lots of other entities that help establish communications in a variety of way and really set
[07:54.980 --> 08:03.940]  up what is a fairly austere environment from the beginning to enable the capture and execution of
[08:03.940 --> 08:10.660]  these exercises. So utility A has five low voltage substations. And I'll talk about what low voltage
[08:10.660 --> 08:17.760]  means to us. And one high voltage substation. It also had one generator, which is the crank path,
[08:17.760 --> 08:23.080]  in essence, that's getting to the high value assets. Utility B had seven low voltage substations
[08:23.080 --> 08:28.560]  along with three high voltage substations and that crank path had a critical national asset
[08:28.560 --> 08:36.320]  on it that needed to be up and maintained no matter what. Utility C was similar to utility A.
[08:36.320 --> 08:41.880]  It had five low voltage substations and one high voltage substation as well as one generator. And
[08:41.880 --> 08:47.760]  so each of these utilities differed in the physical topology and layout of the substations.
[08:47.760 --> 08:52.300]  They also differed in the equipment that was deployed across each of those substations.
[08:53.700 --> 09:01.060]  So the program is really broken into multiple different technical areas. You could look at it
[09:01.060 --> 09:06.940]  as four or five depending on how you want to count. The first area is situational awareness.
[09:06.940 --> 09:11.120]  And so when they deploy on this infrastructure that we've built, they are trying to figure out
[09:11.120 --> 09:16.960]  what happened or what is currently happening on the infrastructure to get as much situational
[09:16.960 --> 09:21.520]  awareness as you can provide. And the reason for that is you can't really trust the devices once
[09:21.520 --> 09:27.000]  they've been attacked to be telling you the right things. And you are in a blackout scenario,
[09:27.000 --> 09:34.120]  so you may not have visibility in much of the grid environment. Network isolation is technical
[09:34.120 --> 09:42.400]  area two. And technical area two is focused on taking the communications that are no longer
[09:42.400 --> 09:50.680]  necessarily trustworthy and expanding those into a realm of trustable or not. And so by that I mean
[09:51.240 --> 09:57.560]  point A to point B may have talked to each other before. And it used to be a dedicated private link
[09:57.560 --> 10:05.180]  or whatever it may be. But now traffic that's going across there is not reflecting what it
[10:05.180 --> 10:09.400]  did previously. So maybe somebody's man-in-the-middle-ing it. Maybe somebody is manipulating
[10:09.400 --> 10:15.120]  it. Maybe it's getting black-holed in some other way. Unknown operation, right? So how do you take
[10:15.120 --> 10:21.740]  the outputs from A and get it to B in a way that's trustworthy and secure and such that it
[10:21.740 --> 10:28.580]  cannot be modified in the middle or that it is evident when it is modified in the middle? So
[10:28.580 --> 10:35.640]  that's the area of research for technical area two. Technical area three is threat analysis.
[10:35.640 --> 10:42.900]  And they really are intended to do the forensic response per se on the devices itself. So how do
[10:42.900 --> 10:49.680]  you diagnose and remove cyber threats from the embedded devices, from the different pieces that
[10:49.680 --> 10:59.560]  are involved, et cetera, as you go through the investigation part of the environment? So the
[10:59.560 --> 11:04.880]  environment itself is technical area four. And that's conducted by us at the University of
[11:04.880 --> 11:12.380]  Illinois. And that is the environment by which the exercise happens, which is on Plum Island,
[11:12.380 --> 11:18.440]  but also the environment at the University of Illinois' campus that the performers remote into
[11:18.440 --> 11:23.260]  to build out their technology, to extend their capabilities, and to investigate
[11:24.080 --> 11:29.680]  the edge cases of how the grid operates when certain things happen to these devices.
[11:30.700 --> 11:35.180]  So that's part of our central facility, and I'll talk a little bit about that. And then the island
[11:35.180 --> 11:41.960]  is basically a distributed manifestation of that central facility at Illinois. The last
[11:41.960 --> 11:47.160]  technical area is technical area five. And effectively, that is the evaluators of the
[11:47.160 --> 11:56.200]  program. And they build out how to run an exercise in this space and how to determine whether or not
[11:56.200 --> 12:03.480]  progress is being made on the technology, what the appropriate challenge levels are, the, let's say,
[12:03.480 --> 12:11.240]  pain points, or how deep or how hard the red team pushes, et cetera, as we go through the environment.
[12:11.240 --> 12:18.480]  And they technically grade technical areas one, two, three, and four as they go through that.
[12:19.980 --> 12:30.500]  So on to the next slide. RADIX exercise six was really a move from strategic to the strategic
[12:30.500 --> 12:36.600]  notion early on in the program to operational notion with the utilities involved, et cetera,
[12:36.600 --> 12:43.300]  to a tactical deployment on top of that. So if you look up in the upper left-hand corner,
[12:43.300 --> 12:48.060]  the strategic notion is the concept that we talked about of you have a large-scale blackout and what
[12:48.060 --> 12:55.060]  you need to accomplish. The operational notion is the building and establishment of these crank paths.
[12:55.060 --> 13:00.580]  And then the tactical is let's execute on those crank paths and figure out exactly what happened
[13:00.580 --> 13:07.420]  and how it happened. So the picture on the bottom right at the moment is a drone footage
[13:08.600 --> 13:14.740]  sort of zoomed in on one of the substations. And you can see that they're built in shipping
[13:14.740 --> 13:20.520]  containers. And those shipping containers have gear inside them that I'll show you here in a moment.
[13:20.540 --> 13:26.380]  And then these containers are arranged and linked together to build the crank path. And we do that
[13:26.380 --> 13:35.320]  with basically above ground number two SO cord that has connectors on the end that we plug into
[13:35.320 --> 13:39.780]  to the individual gear that we have inside the boxes. And I'll show you what that looks like
[13:39.780 --> 13:48.080]  here in a moment. So you can see sort of a little bit of an edge here of what is inside a container.
[13:48.080 --> 13:53.980]  So this is standing inside a container looking out at the moment with the door open. So on the
[13:53.980 --> 13:58.420]  left-hand side there, you'll see what we call the relay box. On the right-hand side, you'll see what
[13:58.420 --> 14:05.180]  we call the power box, which is a skid-mounted Hoffman enclosure that controls the flow in
[14:05.180 --> 14:11.720]  essence of the substation. So that's the power grid aspect. And the controls of those grid
[14:11.720 --> 14:19.080]  components are in the relay box on the left-hand side. There's also some sensors that are up on top
[14:19.080 --> 14:23.600]  that are providing or that are performer technology, providing some of the situational
[14:23.600 --> 14:28.280]  awareness and attempt to determine ground truth as we go through the environment. And I'll zoom
[14:28.280 --> 14:34.600]  in on a lot of this as we talk. So I'm going to only talk about the testbed itself. I'm not
[14:34.600 --> 14:40.120]  going to talk about any of the performer technology that's specifically built, but I'll
[14:40.120 --> 14:45.980]  dig quite deeply into the testbed, which is my area of responsibility. So the mission that we set
[14:45.980 --> 14:52.620]  out to do is to provide realistic environments that enable this cutting-edge R&D that's not yet
[14:52.620 --> 15:01.100]  done by any commercial available product or any existing research off the shelf. And then we take
[15:01.100 --> 15:06.660]  and create this environment in a way that allows us to validate the effectiveness and frankly,
[15:06.660 --> 15:14.040]  the efficiency of those tools as we go through. So the goal of the program itself is really to
[15:14.040 --> 15:20.700]  take a generational leap forward in the capabilities of testbeds. So we've been building
[15:20.700 --> 15:26.520]  cyber-physical testbeds and leveraging the cyber-physical testbeds for about 13, 14 years now
[15:26.520 --> 15:32.200]  at the University of Illinois. By many accounts, we're sort of the gold standard in terms of
[15:32.200 --> 15:40.740]  capabilities across the nation and arguably the world. But even so, when we pitched our
[15:40.740 --> 15:50.230]  capabilities for this program, our proposal and the going in salve of our proposal was effectively,
[15:51.160 --> 15:56.580]  we have assembled the right team to solve this problem. But what technology exists today and
[15:56.580 --> 16:01.580]  where testbeds are today across all of them that you will encounter and anyone that bids,
[16:01.580 --> 16:05.660]  all of them are woefully inadequate to be able to actually go to the level of
[16:05.660 --> 16:09.620]  realism that will be necessary to validate these tools.
[16:10.080 --> 16:16.820]  And even with us, it is an extreme long shot as to whether or not this will be achievable
[16:16.820 --> 16:22.960]  in the time frame and advancing fast enough to be able to support this post-attack analysis.
[16:23.040 --> 16:27.560]  And so let me riff on that for a second. Cyber-physical testbeds before this program started
[16:27.560 --> 16:32.580]  were primarily focused on let's either build an environment where we're looking purely at a
[16:32.580 --> 16:38.400]  physical phenomenon, or let's build an environment that's proving out a hypothesis,
[16:38.980 --> 16:45.580]  physical or cyber, and look at it from that particular angle, sort of ignoring all of the
[16:45.580 --> 16:52.120]  other details. But in this program, everything is unknown coming in. You don't know what the
[16:52.120 --> 16:57.380]  attackers did to you. You don't know even what the attackers want to do to you on the environment. So
[16:57.380 --> 17:03.600]  you have to have every piece of it as real as possible. But you can't possibly go build, you
[17:03.600 --> 17:12.800]  know, three real crank paths and 24, 27 real substations out there because it just costs way
[17:12.800 --> 17:17.560]  too much money to do. And it's a dangerous environment to be in. So how can you minimize
[17:18.380 --> 17:26.180]  the environment, maximize the safety, and also maximize the realism without running into problems
[17:26.180 --> 17:32.740]  of naysayers with simulation being involved or emulation of devices, etc. So they need to be
[17:32.740 --> 17:37.880]  able to touch it, they need to be able to feel it, they need to be able to see it, and they need
[17:37.880 --> 17:44.260]  to be able to trust that what it does and how it works is going to be reflected or reflective
[17:44.840 --> 17:52.680]  of what happens in the real world. So the outcomes of the testbed work itself, obviously we've
[17:52.680 --> 17:57.700]  created a lot of tools, we've pioneered some new techniques and methodologies, we've combined
[17:57.700 --> 18:03.400]  existing solutions, both that we've had and that others have had, to build an environment
[18:03.400 --> 18:07.800]  together. And we combined that, not just the academic knowledge that we had at the University
[18:07.800 --> 18:16.520]  of Illinois, but in partnerships with key vendors and also with asset owners and
[18:16.520 --> 18:22.580]  operators, and to build really an environment that reflected not only the real world, but
[18:22.580 --> 18:30.860]  that took a whole leap forward on its ability to evaluate research. So what is a testbed?
[18:30.860 --> 18:37.540]  So a testbed really is somebody has a need, let's call them the customer, and that need
[18:37.540 --> 18:44.300]  is to evaluate something. And so a testbed is assets, the thing maybe that they want to evaluate
[18:44.300 --> 18:49.080]  on. It's the people with the knowledge on how to build that environment in the way that
[18:49.080 --> 18:54.740]  represents the scenario they need to look at, etc. What to capture, how to capture, where to
[18:54.740 --> 19:00.860]  capture it, etc. It's the science of how to do so in a realistic way, while still enabling the
[19:00.860 --> 19:06.360]  necessary data capture that sometimes these systems inherently don't support. And then it's
[19:06.360 --> 19:12.740]  that data itself. And that data itself is what is captured from the devices, either willingly or not.
[19:12.740 --> 19:17.740]  How the system was operating, packet captures as an example of the communications that are going
[19:17.740 --> 19:23.000]  across, ground truth as to the physical telemetry of what was actually going across, not just simply
[19:23.000 --> 19:30.400]  what the devices are reporting is happening. And then a manifestation of or configuration of all
[19:30.400 --> 19:37.280]  of those things together that is provisioned out into an environment that you then do the work on.
[19:37.280 --> 19:43.180]  Our capability, we can provision locally the assets in our central environment. We can
[19:43.180 --> 19:48.920]  provision portable environments like what we've built on Plum Island and deploy on Plum Island.
[19:48.920 --> 19:52.740]  And we can also provision into the cloud in a variety of different ways.
[19:53.600 --> 20:01.100]  So why a testbed? What's the value of a testbed? You've seen, obviously, the ICS Village and
[20:01.100 --> 20:06.280]  Capture the Flag stuff. But this is a little bit different. And the reason for a testbed,
[20:06.280 --> 20:10.620]  the reason for the work that we do, is that this mission-critical technology,
[20:10.620 --> 20:14.680]  and why do I call this mission-critical technology? So the technology being built
[20:14.680 --> 20:24.300]  under RADIX is intended to be, let's say, have its glass broken when we're in a blackout scenario,
[20:24.300 --> 20:29.700]  effectively, after we've been hit. That's where the real value of this technology is.
[20:29.720 --> 20:34.660]  And there's arguments, and I am one of the people that will argue this, that that technology needs
[20:34.660 --> 20:41.500]  to be used even before we're hit. But in the end, our grid is down. That's what this technology
[20:41.500 --> 20:47.140]  is built to solve. And so it is absolutely essential, if this technology is called to
[20:47.140 --> 20:53.320]  practice, that it works, and that it resolves the issue, or that it can figure out what's going on
[20:53.320 --> 20:58.560]  in the issues, etc., before we need it. Because if we don't, and we're in a national blackout,
[20:58.560 --> 21:07.560]  national disaster sort of scenario, attacked by an enemy or otherwise, how do we come back if we
[21:07.560 --> 21:13.300]  break the glass on this technology, and it's not been proven to actually work? And so you run it,
[21:13.300 --> 21:17.780]  and it's like, I can't figure out what's going on. I see nothing wrong. There's no problems here
[21:17.780 --> 21:23.940]  whatsoever, but the devices still won't turn on. The grid still is down. These devices aren't
[21:23.940 --> 21:29.620]  operating correctly. And that's a bad scenario to be in. So this is truly mission-critical
[21:29.620 --> 21:33.960]  technology, and we have to prove that it's effective before we need it. But we have to go
[21:33.960 --> 21:38.800]  beyond the theoretical testing of it. We have to put it in all sorts of scenarios across all sorts
[21:38.800 --> 21:44.600]  of different platforms to verify that it works, even in edge cases that it wasn't expecting,
[21:44.600 --> 21:49.220]  in circumstances where it's missing data, in circumstances where it's even being directly
[21:49.220 --> 21:58.280]  attacked or attempted to be, let's say, misled on what is going on. So our solution for that is
[21:58.520 --> 22:03.720]  a realistic, recomposable, and well-instrumented testbed is essential to being able to prove that
[22:03.720 --> 22:08.660]  out, because even the real grid environment cannot be manipulated in the way that we can
[22:08.660 --> 22:13.200]  with a testbed environment. And I'll talk a little bit about some of the innovation in that space.
[22:13.820 --> 22:18.840]  And frankly, everything, as I started with my opening salvo and the proposal, that existed,
[22:18.840 --> 22:23.780]  including the Illinois capabilities, wasn't good enough before this program started.
[22:24.560 --> 22:32.220]  So our approach across it is to build real systems. We also build models, looking at models
[22:32.220 --> 22:38.720]  on the cyber side and physical side, that adapt to the exercise needs, that help us build out
[22:39.400 --> 22:45.020]  behaviors and changes in the flows and the communications of systems, to operate like
[22:45.020 --> 22:50.400]  the real world or to operate in a way that an adversary may be able to manipulate.
[22:50.860 --> 22:57.520]  Everything is built in this modular way. It's adaptable and, let's say, recomposable in a
[22:57.520 --> 23:02.380]  variety of different ways. So we can take a piece and take a substation and how it's physically
[23:02.380 --> 23:07.820]  wired and physically set up at the moment, press a couple buttons, push a different configuration,
[23:07.820 --> 23:13.860]  and now it's a different substation. Same devices, different configs, different network layout,
[23:13.860 --> 23:21.140]  etc. And all of that adaptability or modularity allows us to recompose the system in any way
[23:21.140 --> 23:27.580]  necessary to present different challenges, etc., as we go through. There's also instrumentation.
[23:27.580 --> 23:33.640]  And so the instrumentation is key in that many of these systems, let's say, will cooperatively give
[23:33.640 --> 23:38.620]  you a certain amount of data. But sometimes when you're doing forensic analysis, you need to be
[23:38.620 --> 23:43.600]  able to gather things that are deeper. Or if you're trying to do experimental validation,
[23:43.600 --> 23:49.180]  you need to be able to look at things that the system inherently won't tell you. Or you need to
[23:49.180 --> 23:53.100]  look at it with much more scrutiny than what you would typically look at in the real world.
[23:53.100 --> 24:00.240]  So how do you do that and turn on an appropriate level of data output, data capture, etc.,
[24:00.240 --> 24:05.660]  but that doesn't actually affect the behavior of the systems? Because sadly, some of these systems,
[24:05.660 --> 24:11.680]  as you may know, are underpowered. And if you turn on, let's say, full complete logging of the system
[24:11.680 --> 24:17.120]  or other things, it can bog down the operation of the system. And then it no longer participates
[24:17.120 --> 24:25.840]  or acts as it would in the real world. So the last part is really knowledge. And by that, I mean,
[24:25.840 --> 24:31.900]  we don't just say, look, as academics, we're bright people, trust us, this works. We had to bring in
[24:31.900 --> 24:38.200]  real operators. We had to bring in the manufacturers, the vendors across many different
[24:38.200 --> 24:44.040]  platforms and talk through with them their best practices, their common misconfigurations that
[24:44.040 --> 24:51.000]  they see when integrators are building their platforms, the common ways that they configure
[24:51.000 --> 24:59.060]  their substations in the real world for the asset owners, etc., to both cause, let's say,
[24:59.060 --> 25:03.340]  human error to happen in ways that people accidentally misconfigure things,
[25:03.340 --> 25:08.180]  but also to mimic as closely as possible how people are actually configuring these in practice.
[25:08.200 --> 25:14.720]  And that is to get the right level of protection, the right level of output of data, and even notions
[25:14.720 --> 25:21.260]  of like, okay, what does a SIP-compliant substation look like in terms of what it is logging or not,
[25:21.260 --> 25:26.340]  versus one that's not. So that all comes together in that knowledge area.
[25:27.220 --> 25:36.100]  On the innovation side, we had to innovate quite a bit. And so the, let's say, orchestration or
[25:36.100 --> 25:44.520]  automation stuff that we had previously was good enough for research, but it made a lot of
[25:44.520 --> 25:49.400]  assumptions. And so by that, I mean, it would operate in a centralized environment, but when
[25:49.400 --> 25:55.520]  it tells something to be reconfigured or tries to control something, it expects that, A, the device
[25:55.520 --> 26:02.900]  is reachable, B, that it has access to that device, that it has the credentials to get on that device.
[26:02.900 --> 26:12.160]  It assumes, I guess, C, that it knows what the state of that device is. And then lastly,
[26:12.160 --> 26:16.140]  everything that it did before, it also assumed that the device was trustworthy
[26:16.740 --> 26:23.820]  and in a known sort of condition. So we had to, let's say, break down all of those assumptions
[26:24.340 --> 26:32.240]  and operate in a way that, let's say, didn't rely on any of those existing substations.
[26:32.240 --> 26:40.500]  We applied a bunch of research as well. Obviously, we had over a decade of work in the
[26:40.500 --> 26:47.460]  prevention space and in the detection space and remediation space at the University of Illinois.
[26:47.920 --> 26:52.860]  And we had used that in the testbed in a variety of ways. We had used it or proved it out in the
[26:52.860 --> 26:57.980]  testbed, some of which is even in formal companies now transitioned either to big vendors or as
[26:57.980 --> 27:02.840]  startups. And we had to apply that in sort of a different way as part of the testbed,
[27:02.840 --> 27:09.100]  not to just say, okay, look, here's the testbed environment. But for instance, if we could reach
[27:09.100 --> 27:14.680]  deeper into a device, then we used some of that research that we had to dig deeper into those
[27:14.680 --> 27:19.520]  devices and pull out and extract information that supports the validation of the technology
[27:20.080 --> 27:29.300]  without affecting the performance of the device itself. Our team, in particular, myself and a few
[27:29.300 --> 27:36.580]  others have went really deep on some of these platforms over the years. And so we brought a
[27:36.580 --> 27:43.540]  wide variety of devices to the table that we already knew quite a bit about on the inside.
[27:43.540 --> 27:48.740]  Sometimes even, let's say, one could argue more than what the vendors know about their own devices.
[27:49.600 --> 27:54.900]  In terms of the knowledge and ways that we could poke around inside of these platforms.
[27:55.440 --> 28:00.860]  We had to also build, because we needed to deploy on an austere environment,
[28:00.860 --> 28:07.240]  if you don't have it, then you better bring it type notion. And so we had to build these boxes
[28:07.240 --> 28:13.060]  in a way that were field serviceable. We had to be able to quickly replace components of the system
[28:13.060 --> 28:18.680]  if it were to break or if the intent of the cyber attack against it was literally to brick it so
[28:18.680 --> 28:24.880]  that it was no longer functional. How did we restore that or replace that in as fast of a
[28:24.880 --> 28:31.680]  situation as possible to move on and not basically stop the whole exercise if something were to break?
[28:31.680 --> 28:38.440]  We had to advance our automated configuration, data extraction, and also the notion of the system
[28:38.440 --> 28:44.440]  and its observation when even the network links were no longer trustworthy or reliable to be up
[28:44.440 --> 28:49.180]  or down. Remember, we're in a black start scenario, so we're not even guaranteed that
[28:49.180 --> 28:54.020]  we'll have power on each of the substations to be able to communicate to them. And when they're
[28:54.020 --> 29:00.140]  brought up, we need to maintain the state of everything we captured as they go up and down
[29:00.140 --> 29:08.560]  like a seesaw as they're being attacked and brought up and brought back down, etc.
[29:08.640 --> 29:13.840]  And then we also needed to have the environment in a way that could be recomposable, change the
[29:13.840 --> 29:18.660]  structure of the crank paths, change the behavior of the substation itself, without going through
[29:18.660 --> 29:26.320]  and recabling or rewiring everything that's in there on a hands-on nature. So what are these
[29:26.320 --> 29:33.680]  environments? So combined, I call them the substations in a box. It's two components.
[29:33.680 --> 29:38.780]  This has been built on the extensive facilities we have at the University of Illinois that I've
[29:38.780 --> 29:44.420]  alluded to. We have roughly $100 million worth of hardware and software at the University of
[29:44.420 --> 29:50.840]  Illinois that have been built up over the past decade plus, much of which by donation.
[29:51.120 --> 29:56.160]  That's enabled all sorts of research that we've done in the past with Trustworthy Cyber
[29:56.160 --> 30:01.860]  Infrastructure for Power, which was an NSF effort, a DOE DHS effort called Trustworthy Cyber
[30:01.860 --> 30:08.140]  Infrastructure for the Power Grid. It added grid to the end. Our most recent center that's wrapping
[30:08.140 --> 30:13.480]  up in the next year or two called the Cyber Resilient Energy Delivery Consortium,
[30:13.480 --> 30:19.420]  which is also DOE and DHS funded. Our Critical Infrastructure Resiliency Institute,
[30:19.420 --> 30:25.720]  and a variety of other things that have and leverage the testbed resources at Illinois.
[30:26.520 --> 30:31.240]  So the substations in a box, as I mentioned, they're designed to support this Blackstart
[30:31.240 --> 30:36.360]  crank path analysis and deployed in the field, real grid environments built, etc.
[30:36.360 --> 30:41.460]  They're built in pelican style cases, so they're literally shippable and deployable anywhere we
[30:41.460 --> 30:51.100]  need to stand them up. They're generally mostly IP55 watertight when they're shipped and moved
[30:51.100 --> 30:57.340]  around. When you physically deploy them, we take the case lids off and put them in enclosures.
[30:57.340 --> 31:03.540]  The reason for that is literally so the devices inside don't overheat, but also because you do
[31:03.540 --> 31:08.720]  need some physical access to the devices to control breaker operations and other aspects.
[31:09.120 --> 31:16.200]  We built an environment on an island, so the power infrastructure of what is in the overhead
[31:16.200 --> 31:22.700]  and underground stays, but basically everything else gets torn down and built back up every six
[31:22.700 --> 31:28.680]  months in a different way. There are currently 26 variants of substations deployed across
[31:29.260 --> 31:36.180]  that infrastructure. Those substations have relays, RTUs, substation network switches, routers,
[31:36.180 --> 31:42.340]  as well as an experimental fabric underneath that's controlled by SDN that allows us to do
[31:42.480 --> 31:48.520]  a lot of the capture and, let's say, dynamic changes of the substation itself. All sorts of
[31:48.520 --> 31:54.300]  protocols are deployed. You can see a list of them up on the screen. There's both serial and
[31:54.300 --> 31:59.840]  Ethernet communications. We have custom power connections on the power boxes that allow us to
[32:00.620 --> 32:06.460]  link these systems together in a safe way. And then we have high-voltage infrastructure that
[32:06.460 --> 32:13.400]  I'll talk about as well. So what's a power box and what's in a power box? So power boxes basically
[32:13.400 --> 32:21.440]  think of it like the physical infrastructure of the island or of a real substation. So that's
[32:21.440 --> 32:27.040]  the breakers, the bus bars, the incoming and outgoing feeders on the system. We have a local
[32:27.040 --> 32:32.900]  load feeder as well. We have signalization lights that indicate what the status is of energization,
[32:32.900 --> 32:40.780]  what the status is of breakers. We have a dead bus sync light that's provided. We have analog
[32:40.780 --> 32:49.500]  sync check relays, contactors, auxiliary contacts on the systems, CTs and PTs, control circuitry
[32:49.500 --> 32:53.740]  behind that allows us to operate breakers and various other stuff. We have different modes of
[32:53.740 --> 32:59.700]  operation, sort of a safe mode where we can walk away from the system and the system can't possibly
[32:59.700 --> 33:07.360]  change, which is a sort of a unique scenario differing from the real world. And then we
[33:07.360 --> 33:13.280]  obviously have the ability to locally control breakers as well. So what does that look like?
[33:13.280 --> 33:19.060]  So there are effectively two types of power boxes inherently that we've built.
[33:19.060 --> 33:25.900]  One is a 208 volt three-phase system. One is a 480 volt three-phase system. They look
[33:25.900 --> 33:30.820]  basically identical from the front, except for the size of them is a little bit different.
[33:30.820 --> 33:36.620]  We also have these high voltage systems, which really are Hoffman enclosures that
[33:36.620 --> 33:45.840]  are wall mounted and act like, let's say, semi-intelligent breakout boards for providing
[33:45.840 --> 33:50.800]  telemetry from the high voltage gear to the corresponding devices that are then operating
[33:50.800 --> 33:56.660]  and controlling that high voltage gear. And so think of it sort of like a mapping board
[33:57.220 --> 34:06.060]  in a way. And each power box generally has an incoming circuit, a load circuit, and then two
[34:06.060 --> 34:13.820]  outgoing circuits as it's built out. So basic electrical diagrams in the middle, but nothing,
[34:13.820 --> 34:19.940]  let's say, shocking about that. And then there's the other side of it. And so each of these devices
[34:19.940 --> 34:25.840]  have the number two SO cord coming into these Hubble connectors on the edge. But they also have
[34:25.840 --> 34:32.040]  umbilical cords, which are amphenol connectors, mil-spec amphenol connectors, that basically take
[34:32.040 --> 34:37.280]  all of the telemetry of what is happening on inside the box and provide that telemetry to
[34:37.280 --> 34:42.760]  the devices that need to control it. So that includes the analog and digital signals that
[34:42.760 --> 34:50.420]  need to be sent back and forth between the devices to control them, but also the CTs and PT outputs,
[34:50.420 --> 34:56.100]  etc., from the system itself so that all of the sensing is detectable by the relay boxes.
[34:56.180 --> 35:01.180]  And so what are the relay boxes? Well, the relay boxes are really the brains of the substation. So
[35:01.180 --> 35:07.660]  here are a couple examples showing some of the different technology that's in play. Up in the
[35:07.660 --> 35:16.240]  upper left, those are ABB relays along with an ABB RTU. This is sort of a legacy RTU platform
[35:16.240 --> 35:23.200]  that ABB leverages or uses and has deployed around the world called the RTU-560. In the middle,
[35:23.200 --> 35:30.380]  you'll see some more ABB relays, middle-top. And above that, instead of an RTU-560, you see a
[35:30.380 --> 35:37.080]  Motorola device. This is a Motorola ACE-3680. If you move to the next image, upper right-hand
[35:37.080 --> 35:45.920]  corner, that is an ABB COM-600 rack mount or a COM-600R that is acting as the RTU over those
[35:45.920 --> 35:52.840]  ABB relays that are there. Bottom left-hand corner, you'll see some touchscreen SEL-751 relays. Those
[35:52.840 --> 36:01.500]  are controlled... the RTU in that particular case is an SEL-RTAC, a 3505. In the middle,
[36:01.500 --> 36:08.200]  you'll see touchscreen relays. In that one, there's an SEL-RTAC as well, but that's an SEL-RTAC.
[36:08.200 --> 36:17.000]  It's a SEL-3530 instead of a 3505. And then the far right, you'll see more SEL relays. These
[36:17.000 --> 36:21.880]  ones are not touchscreen. These are another variant of the SEL-751. And those ones are
[36:21.880 --> 36:29.280]  being controlled from an RTU perspective by a Novatek Orion LX. So this shows just some of the
[36:29.280 --> 36:37.040]  diversity of platforms that are there. There are much more, obviously, across 26 substations.
[36:37.400 --> 36:43.120]  Every single substation is unique in some way, shape, or form. So we have a lot of diversity
[36:43.120 --> 36:49.060]  across the environment in terms of platforms, technologies, and configurations. And so
[36:49.060 --> 36:53.800]  diversity could be purely on the configuration side. For instance, different protocols being
[36:55.560 --> 37:04.260]  communicated, different topologies being set up between Utility A, Utility C, Utility B, etc.
[37:05.420 --> 37:12.480]  And lots of other variation on top of that. So let's talk about some, let's say, lessons learned
[37:12.480 --> 37:18.820]  from the program or challenges that we had to tackle, now that you understand some of the gear.
[37:18.820 --> 37:26.820]  First off is safety. When you can't trust anything in the system whatsoever because it is compromised
[37:26.820 --> 37:32.640]  and it's compromised in a way that you may or may not know, may or may not be able to determine,
[37:32.640 --> 37:38.280]  and you don't know what is trustable or not, all of the systems that are there are effectively
[37:38.980 --> 37:44.840]  designed to protect you. But if you can't trust the digital systems to protect you anymore,
[37:44.840 --> 37:49.160]  then you need additional layers of protection. So we had to layer protection throughout the system
[37:49.160 --> 37:55.300]  in both physical and cyber form. And that included things like analog time,
[37:56.900 --> 38:02.920]  analog protections in the system, like analog sync check relays, time over current protections,
[38:02.920 --> 38:08.700]  thermal protections. On the digital side or on the cyber side, the protective relays were configured
[38:08.700 --> 38:16.420]  with safe and sane settings for overvoltage and undervoltage conditions, etc. The boxes had arc
[38:16.420 --> 38:22.460]  flash analysis done on them to determine potential exposure or safety measures from PPE perspective
[38:22.460 --> 38:27.820]  that needed to be done. The cabinets were all either pad lockable or direct key lockable.
[38:27.980 --> 38:33.900]  We had, on the high voltage side, interrupters that were acting as fail safes if there's anything
[38:33.900 --> 38:39.520]  that flows through. The low voltage to the high voltage side, an out of sync close or something
[38:39.520 --> 38:45.840]  like that that happened to happen, or a surge somewhere, the interrupters or a fault on the
[38:45.840 --> 38:51.260]  line, the interrupters were there to protect the system at the high voltage side. We also
[38:51.260 --> 38:58.780]  purposefully did not target the high voltage control. That way we didn't run into, let's say,
[38:58.780 --> 39:04.840]  big issues. Low voltage was something we could cause a problem on and be okay with, but on the
[39:04.840 --> 39:10.960]  high voltage side people could get hurt. All of the connectors we used were screw-in locking style
[39:10.960 --> 39:17.680]  connectors. We had all sorts of internal wiring protection, etc. So the key is that the environment
[39:17.680 --> 39:23.860]  itself was designed to be safe no matter what. So people that knew nothing about power systems
[39:23.860 --> 39:30.180]  could still safely operate in this environment and not run a risk of being electrocuted or
[39:30.180 --> 39:36.140]  whatever. We always had power engineers and safety officers effectively on site that were making sure
[39:36.140 --> 39:42.560]  that people did safe operations and maintained the necessary perimeters, even at a noise level
[39:42.560 --> 39:53.840]  from the generators and various other aspects. But really the system protected itself in every way,
[39:53.840 --> 40:00.780]  so let's talk about some operational lessons that we learned in executing exercises for
[40:00.780 --> 40:07.540]  blackstart restoration, but also in austere environments under conditions of blackouts,
[40:07.540 --> 40:13.780]  etc. And so let me talk a little bit about the mode of execution as we go through this. So
[40:13.780 --> 40:18.580]  when I say that we're operating utility environments, we are actually operating
[40:18.580 --> 40:24.760]  the utility environments. And by that I mean utility operators from real utilities come to
[40:24.760 --> 40:32.460]  the island and they run the infrastructure. And they basically take control of it, we hand it
[40:32.460 --> 40:37.580]  over to them, and then they tell everyone else what to do, how to do it, when to do it, etc.
[40:37.920 --> 40:44.720]  on the system. Now obviously we have some exercise control over that, but the intent is to really
[40:44.720 --> 40:51.020]  make this as real as possible. So let's talk about that realism. Many people don't believe
[40:51.020 --> 40:56.420]  what is possible until it really is, let's say, slapping them in the face. And by that I mean
[40:58.300 --> 41:05.320]  a common view is, look, relays, they're embedded devices, you can't make them do things, you can't
[41:05.320 --> 41:12.540]  disable their protection, you can't change their mode of operation beyond their config.
[41:13.220 --> 41:19.000]  And until they saw us do that, they didn't really believe it. Even when it was happening
[41:19.000 --> 41:23.920]  right in front of them, they still didn't believe it until they dug deeper and started to look
[41:23.920 --> 41:28.880]  deeper at what was happening and how it was happening to realize that, look, bad things
[41:28.880 --> 41:35.440]  really are possible, that right now your mindset is that this isn't possible at all, that no one
[41:35.440 --> 41:40.760]  can do that. You had to physically modify the device in order to do that. And it's like, no,
[41:40.760 --> 41:47.540]  we can actually do that via cyber means. So let's talk about the people, right? One of the things
[41:47.540 --> 41:54.320]  that was an operational lesson is academia is really good about thinking outside the box,
[41:54.320 --> 42:01.360]  but we needed to be extremely agile and think even further outside of the box and push through
[42:01.360 --> 42:10.400]  and build stuff that, frankly, was, let's say, impossible to build at any given moment.
[42:10.400 --> 42:17.640]  We took on building one utility, then two, then three, all in six-month iterations. Completely
[42:17.640 --> 42:23.360]  different architectures, completely different devices, building the physical boxes, and
[42:23.360 --> 42:30.320]  standing it up in a new realistic configuration at each sort of interval. That's pretty difficult
[42:30.320 --> 42:36.060]  to do. It's multiple weeks to build the environment each time. It's multiple weeks of testing. It's
[42:36.060 --> 42:40.760]  multiple weeks of evaluation to make sure that it is as real as possible, that it doesn't have
[42:40.760 --> 42:47.580]  inherent artifacts itself that people may view as compromises, etc., in the system, that it is
[42:47.580 --> 42:55.200]  pristine, blue sky, trustable, etc., and built the way it should be built. And that results in,
[42:55.200 --> 43:01.660]  let's say, extreme levels of stress at times, but as a team, not just the University of Illinois,
[43:01.660 --> 43:06.920]  but broadly the entire program, we all pulled together and found success in every exercise that
[43:06.920 --> 43:13.260]  we had. There was no exercise that failed because the infrastructure or the people failed to deliver.
[43:14.700 --> 43:21.660]  And so that brings us to pace. So I mentioned every six months. So DARPA programs move very
[43:21.660 --> 43:27.820]  fast, and the expectations are very high of the technology, of the evaluation, of the test bed,
[43:27.820 --> 43:36.820]  etc. And so there were many times that we were facing failure. But let's say by pure blunt force,
[43:36.820 --> 43:45.800]  and by pure blunt force, I mean number of hours and long days and leveraging and leaning on each
[43:45.800 --> 43:52.180]  other throughout the program, we were able to pull through and pull off what seemed to be impossible
[43:52.980 --> 43:59.420]  when we started. So those are some just operational aspects that we learned as we
[43:59.420 --> 44:05.060]  built the environment, especially out on an island. And so when I say, you know, what are some challenges,
[44:05.060 --> 44:11.240]  right? So there were times when we were, you know, we took a ferry every day from the mainland to the
[44:11.240 --> 44:16.360]  island, and there were times when, you know, nor'easters were blowing in and other aspects where
[44:16.360 --> 44:24.820]  the waves were 12, 14 feet high, and the, you know, conditions were such that if you go to the island,
[44:24.820 --> 44:29.820]  you're going to be sleeping there because you're probably not going to make it off. And that's
[44:29.820 --> 44:36.360]  kind of, you know, let's say a bit much, but there were many days that happened like that.
[44:36.360 --> 44:41.740]  There were days we were out there executing the environment, trying to restore after the cyber
[44:41.740 --> 44:48.480]  attack, and, you know, there was nor'easters blowing through with 40 plus mile an hour winds
[44:48.480 --> 44:55.740]  and downpours and inches of rain falling an hour, etc. So not just hard from a technical perspective,
[44:55.740 --> 45:02.060]  but even harsh environments that we were out in as we were trying to restore these devices,
[45:02.680 --> 45:06.580]  and as the teams and the utility operators were operating these devices
[45:07.380 --> 45:12.320]  faced with those types of environmental conditions. So we learned a whole bunch of
[45:12.320 --> 45:20.540]  other lessons too. So one of them that was interesting to us was that when the systems
[45:20.540 --> 45:25.940]  are intended to break, when you know that you can't trust anything anymore, you have to think
[45:25.940 --> 45:31.580]  differently about what you can build and how you build it so that it works consistently and reliably
[45:31.580 --> 45:36.760]  even when everything is intended to be broken. So that goes back to some of the stuff we did
[45:36.760 --> 45:44.020]  with safety, but also on the cyber side. So as an example, we had no guarantee of power,
[45:44.020 --> 45:49.780]  reliably or not, at any point in time. So how do we guarantee we don't lose data,
[45:49.780 --> 45:56.040]  as an example, as we're going through, or that we get, let's say, eventual consistency where
[45:56.040 --> 46:01.400]  if part of the network is up and being controlled by our orchestration and another part of it is
[46:01.400 --> 46:07.840]  down, how do we get it to catch back up and get in the right configuration when it does come back up
[46:07.840 --> 46:13.440]  if it needed to be changed, or to collect all of the data that it had when it wasn't centrally
[46:14.400 --> 46:21.560]  reachable, when it was isolated and only, let's say, on backup power on its own. So that presented
[46:21.560 --> 46:26.520]  some interesting challenges that we had to tackle, which is basically fault-tolerant and distributed
[46:26.520 --> 46:33.220]  computing in a nutshell. The other issue that was somewhat interesting that we had to
[46:33.220 --> 46:38.820]  innovate on was the gear in the field typically isn't hot swappable. You can't just go grab a
[46:38.820 --> 46:43.700]  relay that's in a real substation and pull it out and pop another one in, in most of the substations,
[46:43.700 --> 46:51.000]  without having to do some rewiring, without having to take circuits out of commission,
[46:51.000 --> 46:56.140]  de-energize them, lock out, tag out procedures, etc. But if you're in a fast-paced exercise
[46:56.140 --> 47:01.340]  environment, we have seven days, and if people get stuck and we need to reverse out basically
[47:01.340 --> 47:07.440]  what the bad people did, how do we reverse that out in as fast a way as possible without bringing
[47:07.440 --> 47:13.200]  the whole exercise down or the whole exercise to a halt for hours on end? So we had to create
[47:13.200 --> 47:17.440]  sort of quick connects and all sorts of other things that allowed us to swap gear out very,
[47:17.440 --> 47:25.080]  very quickly. Further, when we're testing bleeding edge technology, things fail because sometimes
[47:25.080 --> 47:31.160]  that technology doesn't work, or the research doesn't do what it's supposed to do. And so
[47:31.160 --> 47:35.300]  that's your primary plan is, okay, look, it's going to try this, and if it works, great.
[47:35.300 --> 47:39.620]  Well, you have to have a backup plan, right? And in many cases, we had to have another backup to
[47:39.620 --> 47:46.660]  the backup plan because we couldn't fail and make everything stop working. So if anyone in the
[47:46.660 --> 47:52.980]  program failed, we had to have a backup plan for what would happen if they failed, and then a backup
[47:52.980 --> 48:00.680]  plan for if our backup plan failed. And then the last time, the last one, which is a lesson learned
[48:00.680 --> 48:07.980]  is we thought we were prepared in many, many, many occasions. But let's say, you know, in between days
[48:07.980 --> 48:13.940]  of the exercise, when we would pause and go back to our hotels to sleep, etc., I would often make
[48:13.940 --> 48:21.020]  runs to Home Depot or Lowe's or electrician stores or whatever, because no matter how many spare
[48:21.020 --> 48:28.000]  parts we had, no matter what tools we had on us or available to us, anything that can break
[48:28.000 --> 48:33.220]  will break at some point when you're running these environments. And it's almost always
[48:33.220 --> 48:38.120]  in the way that you never thought of or that you couldn't have anticipated. Something that has,
[48:38.120 --> 48:44.460]  you know, a 10,000-hour mean time between failure or 100,000-hour mean time between failure fails in,
[48:44.460 --> 48:50.600]  you know, 10 hours instead. So lots of interesting challenges on just keeping
[48:50.600 --> 48:56.440]  things up and operational and making sure that stuff was readily available if anything did break.
[48:56.940 --> 49:00.780]  So let's talk a little bit about some personal takeaways, and I'll wrap these up really quickly
[49:00.780 --> 49:08.320]  here. But let's talk, let's say, not as the University of Illinois, but let's talk about
[49:08.320 --> 49:15.520]  this from my perspective. So vendors, be this ICS cyber solutions or the vendors themselves,
[49:15.520 --> 49:23.320]  they often claim capabilities that, let's say, are much more limited in the real world application
[49:23.320 --> 49:28.620]  than what most people realize. So a utility may say, I go buy this platform and it's got me
[49:28.620 --> 49:34.120]  covered and it can do all of these things and I can check that box and I'm good. The reality is,
[49:34.760 --> 49:42.740]  no matter what vendors claim, there's often still very large gaps there. And so even the
[49:42.740 --> 49:46.740]  commercial off-the-shelf ICS cyber solutions that are out there, they're missing huge amounts of
[49:46.740 --> 49:52.900]  surface area on what adversaries can do against these boxes. Radix technology was designed to
[49:52.900 --> 49:57.960]  close some of those gaps, but not all of them, right? So even as great as the Radix technology
[49:57.960 --> 50:05.080]  is, we still have a long way to go, right? In building the environments that we built and
[50:05.080 --> 50:12.640]  helping to construct the exercise evaluations and the test effective payloads and all that,
[50:12.640 --> 50:16.920]  in essence, is how do you cause a condition from a cyber means to happen on these devices,
[50:16.920 --> 50:23.000]  such that there are artifacts or implementations that then people can forensically find, right?
[50:23.520 --> 50:28.840]  So all of those, in doing that, we found hundreds of issues on these devices. And it's not like
[50:28.840 --> 50:33.340]  we've never looked at these devices before. And by issues, I don't necessarily mean security
[50:33.340 --> 50:39.100]  vulnerabilities, but in some cases it was security vulnerabilities. But merely other items like,
[50:39.100 --> 50:44.700]  look, compatibility between host A and host B, it doesn't work. Even though they're supposed
[50:44.700 --> 50:49.500]  to interoperate, it doesn't. There's nuances, differences, there's differences in documentation
[50:49.500 --> 50:55.280]  on what is communicated on the actual wire versus what it says it's going to communicate as.
[50:56.200 --> 51:04.120]  Another thing, which was a personal takeaway, is being the adversary is fun, right? It's nice
[51:04.120 --> 51:10.820]  to be on the red team and hack these devices and break them in a variety of ways. But in the
[51:10.820 --> 51:15.940]  program, the defense and recovery technology, including the RADIX technology and commercial
[51:15.940 --> 51:23.520]  off-the-shelf stuff that I've personally seen, it's still way behind what I was capable of or
[51:23.520 --> 51:29.660]  what others on the red team were capable of doing to these systems. So if we're so far ahead,
[51:29.660 --> 51:36.020]  we could just obliterate any of that tech, then what fun is there in that? It's not even a fair
[51:36.020 --> 51:41.900]  fight at that point. So a lot of times we limited the activities that we were doing to poking the
[51:41.900 --> 51:47.840]  bear rather than destroying it. So we didn't go out for the throat kill. We kept a pace with
[51:47.840 --> 51:52.500]  what the technology was capable of and designed something that was just a little bit ahead of that,
[51:52.500 --> 52:00.820]  pushing them each iteration to improve. And so one sort of broad takeaway is,
[52:00.820 --> 52:06.740]  and I say this not just of technology in North America or whatever, but the whole world, what
[52:06.740 --> 52:12.480]  they have in this space in industrial control systems, detecting cyber attacks on these devices,
[52:12.480 --> 52:17.120]  that detection, that mitigation, and that remediation technology for the electric power
[52:17.120 --> 52:24.860]  grid. And I'll stretch it a bit and say actually all critical infrastructure still really has a
[52:24.860 --> 52:31.280]  long way to go. There's still so much work that needs to be put on those platforms to really
[52:31.280 --> 52:37.160]  protect the systems from somebody who is truly focused and determined and understands these
[52:37.160 --> 52:44.200]  systems at a deep and inherent level. So that's all of my technical content. I will just flash
[52:44.380 --> 52:49.960]  a slide real quick, which is the testbed at Illinois and much of what we've built in the
[52:49.960 --> 52:56.300]  RADIX program has been enabled by lots of companies. These are some of the
[52:56.300 --> 53:01.300]  the companies that have donated gear to us, software to us, et cetera, that have helped
[53:01.300 --> 53:07.680]  enable the things that we've done. But without them and without the commercial support and
[53:07.680 --> 53:13.260]  the vendor support and the utility support broadly across this program, we wouldn't be successful.
[53:13.260 --> 53:17.260]  The DARPA RADIX program wouldn't have been successful. So thank you to all of those companies
[53:17.260 --> 53:24.600]  and what they did. And then I think we still have a couple minutes for questions. And so
[53:24.600 --> 53:30.680]  I'll also leave a sort of bonus link down there at the bottom. There's a GitHub repo that I created
[53:30.880 --> 53:37.740]  a number of years ago at the S4 conference, and I've been maintaining it sort of ad hoc
[53:37.740 --> 53:43.440]  ever since. And that's a bunch of ICS security tools in a variety of different forms that are
[53:43.440 --> 53:50.460]  aggregated and categorized in various ways and mirrored when their original location
[53:50.460 --> 53:55.640]  is no longer available. So do check that out. It has a whole bunch of useful things in it.
[53:55.640 --> 54:01.040]  And with that, I will stop talking and I think we might have a little bit of time for question
[54:01.040 --> 54:10.910]  and answer. Let's see. Hey, Tim, this is Bryson. How are you doing? Wonderful. Hi, Bryson.
[54:11.070 --> 54:17.030]  Thank you. Thank you for the talk. I noticed that you are on our Discord and I actually
[54:17.030 --> 54:22.570]  promoted you to a speaker during your talk. So you should now have that badge tied to you.
[54:22.790 --> 54:29.750]  And what we recommend is if you could post that GitHub link in the speaker Q&A
[54:30.550 --> 54:38.630]  section of Discord. Okay. And then reach out for folks to engage you there and for questions.
[54:38.930 --> 54:44.710]  Okay. Sounds good. So really appreciate you jumping on and giving this talk.
[54:45.370 --> 54:51.730]  Having been and seen this for myself, it is really impressive. And my favorite part for
[54:51.730 --> 54:58.570]  the sensors was of course the inflatable guys like you see next to the used car sales. I thought
[54:58.570 --> 55:02.690]  that was a really interesting way to show whether something is up or down at a physical distance.
[55:02.950 --> 55:05.510]  Yeah, we affectionately call them the dancing men.
[55:07.770 --> 55:14.170]  And sort of a funny aside, in the Nor'easters, when you have torrential downpour, those things
[55:14.170 --> 55:18.950]  get wet and then they turn into sort of like whiplashes. So I was up there, you know,
[55:18.950 --> 55:25.630]  untangling them on many occasion and getting sort of smacked by the dancing men as I was trying to
[55:25.630 --> 55:30.130]  untangle them so that they could fly. Well, that's how you know it works.
[55:32.720 --> 55:38.560]  Well, anyway, Tim, I appreciate you joining us and supporting the village and look forward to
[55:38.560 --> 55:43.700]  the commentary and the Q&A on Discord. Awesome. Yeah. And everyone do check out
[55:43.700 --> 55:47.980]  what they've set up for, you know, the CTFs and other things in the village.
[55:47.980 --> 55:52.540]  You know, they do an awesome job of creating environments that you can play with.
[55:52.660 --> 55:58.160]  Sadly, I can't offer my environment out to the world in such an easy and accessible way.
[55:58.200 --> 56:02.880]  But hopefully in the future, I'll get deeper engaged with the ICS village and bring some
[56:02.880 --> 56:07.680]  of this tech and some of this capability to the village so you guys can all play with it too.
[56:08.100 --> 56:12.380]  Yeah, we look forward to that. That's probably been the biggest innovation we've had is because
[56:12.380 --> 56:17.720]  of the pandemic. The amount of effort we've had to spend on figuring out how to make these things
[56:17.720 --> 56:23.660]  virtually accessible, which is the typical limitation for concurrent access. We kind of
[56:23.660 --> 56:27.740]  have solved it. So we'd love to touch base with you afterward and talk about it.
[56:27.800 --> 56:32.820]  Yeah, it's, you know, great to hear that you guys have had to tackle that. We're currently
[56:32.820 --> 56:39.420]  tackling that for the DARPA Radix effort as well. With the last exercise, it will be predominantly
[56:39.420 --> 56:46.700]  remote. And we went from the prior exercise being, you know, network isolated, completely trusted,
[56:46.700 --> 56:52.200]  everyone in a specific zone to, you know, part of this is deployed in the cloud, and everyone
[56:52.200 --> 56:59.400]  is distributed around the country, and all accessing this in a controlled and crazy way,
[56:59.400 --> 57:04.720]  including like streaming body cams that we'll have and all sorts of stuff. So we're on that
[57:04.720 --> 57:12.060]  same roller coaster due to COVID at the moment. Well, we look forward to collaborating. So again,
[57:12.060 --> 57:17.600]  thank you very much, and we'll see you on Discord. Yeah, you guys have a great day. All right, take care.
